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/ 

Implementation  of  Computer  Assisted  Breast  Cancer  Diagnosis 
(US  Army  Grant  No.  DAMD17-93-J-3007) 


1.  Introduction 

Recently,  several  investigators  have  proposed  a  number  of  methods  for  the  automatic  detection  of 
microcalcifications  and  masses  on  mammograms.  Significant  improvements  in  accuracy  have  been  made 
since  the  initial  attempt  [Chan  1987;  1988]  to  apply  the  computer  algorithms  for  the  detection  of 
microcalcifications.  We  believe  that  it  is  important  to  implement  the  program  into  a  high  speed 
workstation  and  conduct  a  large  scale  clinical  trial  in  order  to  evaluate  its  clinical  practicability  and 
limitations.  Although  the  false-positive  rate  for  the  detection  of  masses  is  still  very  high,  we  have  been 
using  an  artificial  neural  network  to  classify  malignant  and  benign  masses.  We  believe  that  the  creation 
of  a  computer  program  to  analyze  features  of  suspected  masses  will  give  rise  to  a  more  useful  and 

fundamental  approach  to  computer-aided  diagnosis. 

Because  digital  mammography  produces  a  large  data  volume  for  its  high-resolution  imaging,  data 
compression  is  an  important  means  to  facilitate  the  mammographic  image  transmission  and  storage.  We 
have  studied  characteristics  of  the  mammograms  and  developed  compression  methods  specifically  for 
mammograms  using  gray  value  splitting  in  conjunction  with  wavelet  and  full-frame  discrete  cosine 
transform  (DCT)  techniques.  Effects  of  applying  the  data  compression  to  the  proposed  computer  aided 
diagnosis  (CADx)  scheme  in  the  detection  of  microcalcifications  were  also  tested  during  this  reporting 

period. 

2.  Research  in  the  Detection  of  Microcalcifications 

9.,  1  ■  Detection  of  Suspected  Microcalcifications 

Microcalcifications  in  breast  cancer  are  reported  to  occur  with  five  or  more  microcalcifications  as 
a  cluster  in  a  Icm^  area  [Black  1965,  Fisher  1975].  When  the  digitization  pixel  size  is  50  pm  (using  a 
Lumiscan  150),  there  are  40,000  pixels  in  Icm^  area.  To  have  five  detections  or  pixels  (0.0125%) 
possessing  high  intensity  in  the  area  means  that  one  should  set  a  threshold  on  pixel  intensity 
approximately  3.61  cr  ( cr:  standard  deviation).  In  one  experiment,  we  used  3.02  a  as  the  threshold 
corresponding  to  a  maximum  of  50  pixels  (0. 125%  as  indicated  in  Figure  1)  due  to  potential  larger 
microcalcification  containing  several  detected  pixels  together.  Note  that  a  background  trend  correction 
was  applied  to  each  image  block  prior  to  the  statistical  calculation.  The  previously  detected  suspected 
areas  (i.e.,  50  pixels)  were  masked  with  the  mean  value  in  this  detecting  procedure.  This  procedure  was 
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performed  with  a  lcm2  template  (200x200  pixels)  by  moving  190  pixels  per  step  for  each  operation  and 
by  scanning  through  the  mammogram  horizontally  and  then  vertically. 


Figure  1.  Assuming  the  noise  spectrum  fits  Gaussian  distribution,  only  0.125%  of  pixels  have  an 

intensity  higher  than  3.02  cr. 

After  carefully  evaluating  twenty-two  mammograms  containing  subtle  microcalcifications  (only 
three  clustered  microcalcifications  on  three  mammograms  were  associated  with  malignant  process),  we 
found  that  the  use  of  3.02  a  for  the  threshold  value  was  fine  except  for  radiolucent  regions  (OD  >  2.3) 
where  a  threshold  value  should  be  set  at  2.75  a  corresponding  to  120  pixels  (0.3%)  in  1  cm^  area.  In 
addition,  when  a  large  area  was  detected  (>  30  pixels)  then  additional  pixels  corresponding  to  the  area 
would  be  granted  in  the  local  operation.  Our  results  indicated  that  all  microcalcifications  (27  clusters 
confirmed  by  biopsy  and  126  singles  were  confirmed  by  an  experienced  radiologist)  were  detected 
through  the  above  procedure.  However,  an  average  of  858  suspected  areas  per  mammogram  was 
obtained  (i.e.,  99.5%  false-positive  rate  for  100%  true-positive  detection).  This  procedure  is  equiyajent 
to  a  pre-scan  process  of  a  computer-aided  diagnosis  in  the  detection  of  microcalcifications  [Chan  1987; 
19901.  The  important  point  here  is  that  we  have  developed  an  effective  computer  program  that  can  detect 
all  microcalcifications.  It  takes  5-7  seconds  in  a  DEC  Alpha  computer  to  run  a  digital  mammogram  of 
4  096  X  5. 1 20  pixels.  The  suspected  areas  will  be  used  for  the  further  evaluation  of  CADx  using  more 
■Strait  criteria  and  in  the  mammographic  image  compression  for  error  handling  in  the  next  section. 


3.  Adaptive  Lossless  Mammographic  Image  Compression 

We  have  also  developed  an  adaptive  lossless  compression  scheme  for  mammograms  by 
combining  a  high  compression  method  and  techniques  involving  the  detection  of  all  suspected 
microcalcifications  to  ensure  data  accuracy  in  the  clinically  significant  areas.  In  the  previous  section,  we 
described  how  to  detect  suspected  micreocalcifications.  To  handle  858  suspected  areas  is  not  a  big  task 
at  all  when  compared  to  the  compression  of  a  4Kx5K  mammogram  (see  section  3).  However,  we  can 
preserve  the  maximum  data  accuracy  on  clinically  significant  areas.  This  type  of  error  control  should  be 
used  in  any  medical  image  compression  scheme  when  possible. 

3.1.  Mammographical  Image  Compression  via  Wavelet  Decomposition 
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Recently,  we  have  used  a  wavelet  transform  for  mammographic  image  compression  [Daubechies 
1988,  Mallat  1989,  Cody'1992,  Atonini  1992],  Before  the  wavelet  transform,  the  boundary  of  the 
breast  was  outlined.  Only  the  area  within  the  boundary  was  the  area  to  be  compressed.  Figure  2  shows 
a  typical  multi-level  wavelet  transform  and  the  associated  compression  procedure.  The  larger  the  image, 
the  more  levels  of  wavelet  transform  can  be  applied.  In  general,  “A”  contains  a  much  smaller  computer 
space  than  “B”  and  “A”  space  +  “B”  space  is  about  4Kx5Kx3  bit  (a  compression  ratio  of  4:1).  If  the 
air  region  is  included  in  the  compression  process,  the  average  error-free  compression  ratio  is  -2.5:1. 


Bit  allocation,  quantization,  and 
error-free  coding 


Quantization  errors  can  be  encoded  by 
an  error-free  coding 


0 


Figure  2.  A  typical  wavelet  decomposition  and  associated  compression  procedure  for  a  mammogram. 

(Note:  only  a  two-level  decomposition  is  shown.) 


In  this  study,  we  decomposed  each  image  with  7-level  wavelet  transform;  hence,  the  smallest  size 
image  will  be  a  matrix  of  128  x  160  pixels.  The  lowest  resolution  subimage  will  be  further  decomposed 
by  an  operation  called  deferential  pulse  code  modulation  (DPCM).  The  entropy  of  the  all-decomposed 
subimages  will  be  calculated  to  determine  the  best  wavelet  kernel  for  the  mammographic  image 
compression. 

3.2.  Error-Controlled  Compression  for  Digital  Mammograms 

We  believe  that  an  accurate  error-control  procedure  is  an  innovative  solution  to  make  a 
compression  scheme  clinically  useful.  A  computer  scheme  for  the  compression  was  tested  and  is 
described  as  follows: 

(a)  Detect  all  suspected  microcalcifications  (clusters  and  singles)  based  on  the  method  described 
in  Section  2. 

(b)  Perform  an  error-free  compression  using  DPCM  and  arithmetic  coding  on  the  detected  areas. 
Replace  the  area  with  surrounding  intensity  using  cubic  spline  interpolation. 

(c)  Perform  multi-level  wavelet  transform  for  the  mammogram. 
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(d)  Perform  quantization  on  the  wavelet  domain  (For  the  higher  level  of  low  resolution 
subimages  thedess  destructive  quantization  should  be  applied.) 

(e)  Perform  an  entropy  coding  on  quantized  subimages  to  get  file  “A”  indicated  in  Figure  1 . 
(arithmetic  coding  [Witten  1987]  for  uncorrelated  coefficients  and  L-Z  coding  [Ziv  1978]  for 
correlated  data  sequence.). 

3.3.  Experimental  results 

The  unique  point  of  this  work  is  to  add  the  error-free  feature  for  the  suspected  disease  areas  to  a 
compression  scheme.  No  compression  artifact  shall  be  observed' by  an  experienced  breast  radiologist. 
One  must  realize  that  there  is  no  need  to  digitize  a  resolution  as  high  as  50p.m/pixel  except  those  areas 
containing  subtle  microcalcifications.  However,  the  error  control  feature  reduced  some  degrees  of  the 
entire  compression  efficiency  (ratio).  Equation  (1)  provides  a  formula  to  calculate  the  effective 
compression  ratio  when  the  error-control  feature  is  added  into  the  compression  system. 

/?= _ -  ...(1) 

^  {R-R^)xNxS  +  RJ 

where  T  is  the  total  number  of  pixels  in  the  original  mammogram,  S  is  the  number  of  pixels  in  the 
suspected  area  for  error-free  encoding,  N  denotes  number  of  suspected  areas,  R  is  the  compression  ratio 
obtained  by  performing  a  transform  (wavelet)  coding.  Re  is  the  average  compression  ratio  to  error-free 
encode  microcalcification  areas,  and  Rt  is  the  total  effective  compression  ratio. 

We  tested  the  same  twenty-two  mammograms  as  used  in  Section  3.  We  calculated  the  effective 
compression  ratio  by  providing  values: 

N  -  858; 

S  ==  640  (-25  X  25  pixels)  which  was  averaged  from  8 1  %  tiny  suspects  requiring  20  x  20  pixels 

(i.e.,  Immx  1mm  area)  and  19%  medium-sized  suspects  requiring  40x40  pixels; 

T  =  20,971,520  (4,096x5,120); 

Re  =2.5; 

R  =  40: 1  (estimated  acceptable  compression  ratio)  which  is  partly  due  to  the  fact  that  -50%  of 
mammogram  contains  air  space. 

Substituting  the  above  values  into  Equation  (1).  we  received  Rt  -  29  which  also  indicates  that  an 
additional  40%  of  the  compressed  data  was  increased  when  the  error-free  feature  was  added  to  the 
rompression  scheme.  Since  each  12-bit  datum  is  stored  in  a  16-bit  computer  space,^r  was  38.  for 
current  commercial  data  systems.  Because  the  suspected  areas  may  contain  significant  clinical 
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information,  we  believe  that  the  error  control  feature  is  necessary  and  is  a  cost-effective  approach  for 
mammography  data  redutftion. 


4.  Recognition  of  Mammographic  Microcalcifications  with  an  Artificial  Neural  Network 

We  have  developed  a  computer-aided  diagnosis  (CADx)  program  for  automated  detection  of 
clustered  microcalcifications  in  digital  mammograms.  In  this  study,  we  investigated  the  use  of  a 
convolution  neural  network  (CNN)  in  conjunction  with  the  CADx  program  to  reduce  false-positive  (FP) 
detections. 

Screen-film  mammograms  containing  subtle  microcalcifications  were  digitized  with  a  laser  film 
scanner.  After  signal-to-noise  ratio  (SNR)  enhancement  and  background  removal  with  a  spatial  filter, 
potential  signal  sites  were  detected  with  a  locally  adaptive  gray-level  thresholding  technique.  The  size 
and  contrast  were  used  to  discriminate  false  signals  from  tme  microcalcifications.  The  remaining  signals 
were  then  inspected  by  the  CNN.  Image  blocks  containing  individual  microcalcifications  in  the  SNR- 
enhanced  images  were  input  to  the  CNN.  The  CNN  consisted  of  nodes  organized  in  groups  and  the 
weights  connecting  the  nodes  were  organized  by  convolution  kernels.  These  weights  integrated 
neighborhood  information  for  recognition  of  the  true  signals.  After  training,  we  found  that  a  CNN  with 
two  hidden  layers,  both  contained  10  groups  of  nodes,  was  effective  in  the  classification  of  true  and 
false  signals.  The  output  signals  from  the  CNN  further  underwent  a  regional  clustering  algorithm  for 
detection  of  clustered  microcalcifications. 

We  found  that  the  CNN  could  classify  individual  microcalcifications  with  the  area  under  the  ROC 
curve,  Az,  of  0.88,  FROC  analysis  showed  that  the  addition  of  CNN  classification  to  the  CADx 
program  reduced  the  false-positive  cluster  detection  by  60-70%  for  a  given  true-positive  rate.  After 
adding  a  criteria  regarding  a  minimum  of  3  calcifications  in  one  cluster  for  a  detection,  the  Az  was 
increased  to  0.96.  These  results  indicate  that  the  CNN  can  significantly  increase  the  accuracy  of  the 
CADx  program. 

5.  Computer-Aided  Diagnosis  in  Mammography:  Classification  of  Mass  and  Normal 
Tissue  by  Texture  Analysis 

Computer-aided  diagnosis  schemes  are  being  developed  to  assist  radiologists  in  mammographic 
interpretation.  In  this  study,  we  investigated  if  texture  features  could  be  used  to  reliably  distinguish 
between  mass  and  non-mass  regions  in  clinical  mammograms.  Forty-five  regions  of  interest  (ROIs) 
containing  true  masses  with  various  degrees  of  visibility  and  135  ROIs  containing  normal  breast 
parenchyma  were  extracted  manually  from  digitized  mammograms  as  case  samples.  Spatial  gray  level 
dependence  (SOLD)  matrix  of  each  ROI  was  calculated  and  eight  texture  features  were  calculated  from 
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the  SGLD  matrix.  The  pair-covariance  and  class-distance  properties  of  extracted  texture  features  were 
analyzed. 

Selected  texture  features  were  input  into  a  modified  decision  tree  classification  scheme.  The 
performance  of  the  classifier  was  evaluated  for  different  feature  combinations  and  orders  of  features  on 
the  tree.  A  classification  accuracy  of  about  89%  sensitivity  and  76%  specificity  was  obtained  for  certain 
groups  of  ordered  features  during  the  training  procedure.  With  a  leave-one-out  method,  the  test  result 
was  about  76%  sensitivity  and  64%  specificity.  The  results  of  this  preliminary  study  demonstrate  the 
feasibility  of  using  texture  information  for  distinguishing  masses  from  normal  breast  parenchyma. 


6.  Classification  of  Mass  and  Non-Mass  Regions  on  Mammograms  using  an  Artificial 
Neural  Network 

We  evaluated  the  feasibility  of  using  an  error-backpropagation  based  Artificial  Neural  Network 
(ANN)  classifier  to  detect  mass  regions  on  mammograms.  Regions  of  interests  (ROIs),  which  included 
masses  and  normal  breast  parenchyma,  were  manually  extracted  from  a  database  consisting  of  87  clinical 
mammograms.  Texture  features  based  on  a  spatial  gray  level  dependence  matrix  were  calculated  and 
input  into  an  ANN  using  supervised  back-propagation  training  method.  The  data  were  divided  into  five 
groups  and  different  combinations  of  these  groups  formed  four  sets  of  training  data  and  test  data.  We 
evaluated  the  performance  of  the  ANN  with  different  combinations  of  input  features,  numbers  of  hidden 
layers,  and  number  of  nodes  in  each  layer.  Using  five  input  features,  one  hidden  layer  with  ten  nodes, 
and  an  output  layer  with  two  nodes,  we  achieved  on  the  average  a  true  positive  fraction  of  84%  at  a  false 
positive  fraction  of  34%  with  an  ambiguity  rate  of  5%.  This  pilot  study  paves  the  way  for  further  studies 
in  classification  of  different  types  of  masses  and  normal  breast  parenchyma  when  a  large  data  set  that 
includes  enough  samples  for  each  case  becomes  available. 


7.  Status  Report  in  the  Implementation  of  CADx  for  the  Detection  of  Clustered 
Microcalcifications 

We  continue  to  work  on  the  CADx  program  with  a  DEC  Alpha  workstation.  The  basic  user 
interface  is  complete.  However,  it  requires  suggestions  and  modifications  from  our  clinical 
collaborators.  The  user  interface  can  select  a  mammogram  and  display  it  on  the  workstation.  Several 
image  functions  have  been  implemented;  (1)  "window  and  level"  for  the  adjustment  of  the  brightness  and 
contrast,  (2)  pan,  and  (3)  a  cursor  box  for  the  user  to  select  the  area  of  interest.  The  computer-aided 
detecting  program  is  nearly  complete.  Clincal  trial  will  start  from  February  15,  1995  at  the  Breast 
Imaging  Division  of  Georgetown  University  Hospital. 
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8.  Contractual  (SOW)  Issues 

Dr.  R.V.  Shah,  chief  brest  radiologist,  at  Brook  Army  Medical  Center  and  Dr.  Don  Smith, 
attendant  breast  radiologist,  at  Madigan  Army  Medical  Center  have  agreed  to  send  us  some  proven  cases 
(in  the  Spring  of  1995)  associated  with  mammographic  microcalcifications  for  inclusion  in  our  test 
database  [Private  Communication].  We  will  provide  our  software  for  the  evaluation  at  Army  Hospitals 
after  they  are  ready  to  start  the  experiment. 

9.  Conclusions  of  the  Annual  Report  and  Future  Work 

During  the  last  year,  we  have  spent  our  effort  not  only  in  algorithm  improvement  but  also  in 
merging  our  newly  developed  algorithm  in  C  and  useful  codes  previously  developed  by  Dr.  Chan  and 
her  colleagues. 

At  this  point,  we  have  performed  our  mammographical  image  compression  and  CADx  research  in 
terms  of  algorithm  improvement  and  computer  speed.  Database  collection  is  underway  and  will  continue 
up  to  the  final  stage  of  this  project.  Several  basic  functions  and  user  interface  have  been  implemented  in 
the  workstation.  The  CADx  programs  are  ready  to  undergo  for  a  clinical  trial. 

We  will  spend  most  of  our  research  time  evaluating  the  effect  of  CADx  using  the  proposed 
computer  scheme  and  continuing  compression  research  including: 

9. 1 .  Improvement  in  the  Detection  of  Suspected  Microcalcifications 

We  plan  to  improve  the  algorithm  for  the  detection  of  suspected  microcalcifications  indicated  in 
Section  2.  Two  methods  along  this  research  direction  will  be  evaluated: 

(i)  To  isolate  large  objects  (such  as  ducts,  macrocalcifications,  isolated  mass,  and  other  foreign 
objects  which  are  radiodense)  prior  to  the  statistical  analysis  of  each  region.  This  can  be  done 
using  a  combined  technique  involving  segmentation-based  image  processing  and  contour 
extraction. 

(ii)  When  clustered  microcalcifications  are  found  in  a  given  area,  the  standard  deviation  threshold 
level  will  be  decreased  (i.e.,  higher  sensitivity  level  will  be  set)  to  include  possible  low  intensity 
calcifications  in  the  region. 

9.2.  Rffectivenes  of  the  Adaptive  Lossless  Compression  for  Mammograms 

We  anticipate  that  the  decompressed  digital  mammograms  will  be  reviewed  in  routine 
mammographic  reading.  Contrast  enhancement  function  and  magnification  view  will  be  used  for  the 
viewing  the  decompressed  and  difference  (between  original  and  decompressed)  images.  When  operating 
contrast  enhancement  function,  mean  level  will  be  driven  to  a  low  intensity  (say  the  intensity  of  the  outer 
boundary  of  the  skin)  to  investigate  low  intensity  artifacts  including  possible  minor  structure  changes  and 
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blocky  artifacts  between  a  suspected  area  (error-free  encoded)  and  its  surrounding  parenchyma.  In  the 
investigation  of  the  difference,  the  objective  is  to  observe  any  structure.  Ideal  difference  image  is  a  pure 
random  noise  excluding  the  error-free  encoded  areas  both  in  spatial  and  frequency  domains.  We  will 
also  use  Fourier  analysis  to  investigate  each  difference  image.  Quantization  levels  will  be  varied  to 
investigate  the  acceptable  threshold  using  the  studies  indicated  above.  Without  vigorous  technical  and 
clinical  evaluations  for  the  proposed  compression  method,  no  preset  compression  efficiency  (ratio)  will 
be  recommended.  These  studies  will  only  be  used  for  the  adjustment  of  compression  parameters  for 
technical  use.  A  more  extensive  clinical  study  involving  receiver  operating  characteristic  (ROC)  analysis 
and  subjective  clinical  readings  [MacMahon  1991,  Swets  1982]  will  be  conducted  in  a  future  project. 

In  addition,  we  will  include  the  detection  of  suspected  masses  in  the  proposed  adaptive 
compression  scheme  in  our  research  work  for  the  next  year. 
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